Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We show that any memory-constrained, first-order algorithm which minimizes d-dimensional, 1-Lipschitz convex functions over the unit ball to 1/poly(d) accuracy using at most $$d^{1.25-\delta}$$ bits of memory must make at least $$\tilde{Omega}(d^{1+(4/3)\delta})$$ first-order queries (for any constant $$\delta in [0,1/4]$$). Consequently, the performance of such memory-constrained algorithms are a polynomial factor worse than the optimal $$\tilde{O}(d)$$ query bound for this problem obtained by cutting plane methods that use $$\tilde{O}(d^2)$$ memory. This resolves a COLT 2019 open problem of Woodworth and Srebro.more » « lessFree, publicly-accessible full text available December 31, 2025
-
We show that any memory-constrained, first-order algorithm which minimizes d-dimensional, 1-Lipschitz convex functions over the unit ball to 1/ poly(d) accuracy using at most d^(1.25-delta) bits of memory must make at least d^(1+ 4 delta / 3) first-order queries (for any constant delta in (0,1/4). Consequently, the performance of such memory-constrained algorithms are a polynomial factor worse than the optimal O(d polylog d) query bound for this problem obtained by cutting plane methods that use >d^2 memory. This resolves one of the open problems in the COLT 2019 open problem publication of Woodworth and Srebro.more » « less
-
We provide new gradient-based methods for efficiently solving a broad class of ill-conditioned optimization problems. We consider the problem of minimizing a function f : R d --> R which is implicitly decomposable as the sum of m unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the square-root of the condition numbers of the components. This complexity bound (which we prove is nearly optimal) can improve almost exponentially on that of accelerated gradient methods, which grow as the square root of the condition number of f. Additionally, we provide efficient methods for solving stochastic, quadratic variants of this multiscale optimization problem. Rather than learn the decomposition of f (which would be prohibitively expensive), our methods apply a clean recursive “Big-Step-Little-Step” interleaving of standard methods. The resulting algorithms use O˜(dm) space, are numerically stable, and open the door to a more fine-grained understanding of the complexity of convex optimization beyond condition number.more » « less
-
null (Ed.)We study probabilistic prediction games when the underlying model is misspecified, investigating the consequences of predicting using an incorrect parametric model. We show that for a broad class of loss functions and parametric families of distributions, the regret of playing a ``proper'' predictor--one from the putative model class--relative to the best predictor in the same model class has lower bound scaling at least as sqrt(c n), where c is a measure of the model misspecification to the true distribution in terms of total variation distance. In contrast, using an aggregation-based (improper) learner, one can obtain regret d log n for any underlying generating distribution, where d is the dimension of the parameter; we exhibit instances in which this is unimprovable even over the family of all learners that may play distributions in the convex hull of the parametric family. These results suggest that simple strategies for aggregating multiple learners together should be more robust, and several experiments conform to this hypothesis.more » « less
An official website of the United States government

Full Text Available